CLAIMS 

What is claimed is: 

1 . A system for distributed convolution of samples comprising: 
a sample manager operable to: 

calculate partial sums for a portion of the samples within a convolution kernel, 
wherein the partial sums comprise 1) a sum of weights determined for 
locations of the samples in the portion of samples and 2) a sum of 
weighted sample values for the portion of samples, 

add the partial sums to any previously accumulated partial sums, and 

output new accumulated partial sums; 
a first partial sums bus connected to the sample manager for receiving any 

previously accumulated partial sums; and 
a second partial sums bus connected to the sample manager for outputting the new 

accumulated partial sums; 

2. The system of claim 1 , further comprising a chain of sample managers, wherein 
each sample manager is connected to the next sample manager by a partial sums 
bus, and wherein a last member of the chain calculates pixel values from the final 
accumulated partial sums. 

3. The system of claim 2, wherein for each sample manager the corresponding portion 
of samples resides in a sub-set of screen space and the sub-sets are finely 
interleaved across screen space. 

4. The system of claim 3, comprising 1 6 sample managers, wherein each sample 
manager addresses one sample bin in a 4 by 4 array of sample bins that is repeated 
across screen space. 
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5. The system of claim 2, further comprising a plurality of groups of sample managers 
and a plurality of partial sums buses interconnecting the groups, wherein each group 
is a chain of sample managers. 

6. The system of claim 5, wherein for each sample manager the corresponding portion 
of samples resides in a corresponding sub-set of screen space and the sub-sets are 
interleaved across screen space, wherein for a system of 4 groups of 4 sample 
managers in a chain, each sample manager within a group addresses one sample bin 
in a 2 by 2 array of sample bins that is repeated across a 1 6 by 1 6 array of sample 
bins, and wherein four permutations of each of the four different 16 by 16 arrays 
(one for each group) are combined to form a 64 by 64 array of sample bins that is 
repeated across screen space. 

7. The system of claim 5, wherein a last sample manager in each group calculates 
pixel values from the final accumulated partial sums. 

8. A system for distributed filtering of samples within a convolution kernel to 

calculate values for a corresponding pixel comprising: 

a chain of means for calculating partial sums, 

wherein each member of the chain calculates partial sums for a corresponding 

portion of the samples within the convolution kernel for the pixel, 
wherein the partial sums calculated by each member of the chain comprise 1) 

a sum of weights determined for the sample locations in the 

corresponding portion of samples and 2) a sum of weighted sample 

values for the corresponding portion of samples, 
wherein each member of the chain adds the calculated partial sums to 

corresponding accumulated partial sums and outputs the new 

accumulated partial sums, 
and wherein a last member of the chain calculates pixel values from the final 

accumulated partial sums; and 
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means for passing accumulated partial sums from one member of the chain to the 
next. 

9. The system of claim 8, wherein a sample comprises parameter values for color and 
transparency, and wherein partial sums comprise partial sums for each of the 
parameter values. 

10. The system of claim 8, wherein each means for calculating partial sums is assigned 
one or more sample bins from an interleaved array of sample bins and the 
interleaved array of sample bins is repeated across screen space. 

1 1 . The system of claim 8, further comprising a plurality of means for storing sample 
values, wherein each means for storing is dedicated to a different member of the 
chain. 

12. The system of claim 11, further comprising a plurality of means for rendering 
samples, wherein each sample generated is stored in one of the means for storing. 

13. The system of claim 11, further comprising a means for converting the pixel values 
to video output signals. 

14. A system for distributed filtering of samples comprising: 

a chain of N sample managers (k), wherein k is an integer with range 0 to N-l ; 
a partial sums bus connecting each sample manager (k) in the chain of sample 

managers to the next sample manager (k+1); 
wherein sample manager (k) is operable to: 

receive accumulated partial sums from a prior sample manager (k-1), if k is 
greater than zero, 

calculate partial sums for a set of samples, wherein the set of samples are 
within a sub-set of screen space assigned to sample manager (k), and 
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wherein the set of samples are located within a convolution kernel 

defined for a pixel, 
add the partial sums to the accumulated partial sums, and 
output the accumulated partial sums to sample manager (k+1), if k is less than 

N-l. 

15. The system of claim 14, further comprising a memory (k) dedicated to sample 
manager (k), wherein memory (k) stores sample data for samples in the sub-set of 
screen space assigned to the sample manager (k). 

16. The system of claim 15, wherein each memory (k) comprises a plurality of memory 
units. 

17. The system of claim 15, wherein sample manager (k) is operable to read the set of 
samples from the memory (k). 

18. The system of claim 14, wherein a designated sample manager is operable to 
calculate pixel values from the final accumulated partial sums. 

19. The system of claim 1 8, wherein partial sums comprise partial sums for each 
sample parameter value. 

20. The system of claim 19, wherein each pixel parameter value equals a corresponding 
final accumulated sum of weighted sample parameter values for each sample within 
the convolution kernel divided by an accumulated sum of weights for locations of 
each sample within the convolution kernel. 

21. A system for distributed convolution of samples for a pixel comprising: 

a set of N memories, wherein each memory (k) stores sample data for samples in a 
different sub-set (k) of screen space, wherein k is a non negative integer; 
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a sequence of N filter units, wherein each filter unit (k) is directly connected to a 

dedicated memory (k); and 
a set of N-l partial sums buses connecting each filter unit to the next filter unit in 

the sequence, 
wherein each filter unit (k) is operable to: 

receive accumulated partial sums from a prior filter unit (k- 1 ), if k is greater 
than zero, 

read a set of samples from a memory (k), wherein the set of samples are 
within sub-set (k) of screen space assigned to the memory (k), and 
wherein the set of samples are located within a convolution kernel 
defined for the pixel, 
calculate partial sums for the set of samples, 
add the partial sums to the accumulated partial sums, and 
output the accumulated partial sums to the next filter unit in the sequence of 
filter units if k is less than N- 1 , 
and wherein the last filter unit (N-l) in the sequence calculates pixel values from 
the final accumulated sums. 

22. The system of claim 2 1 , wherein N = 1 6 and the 1 6 different sub-sets of screen 
space are finely interleaved so that each filter unit addresses one sample bin in a 
4 by 4 array of sample bins that is repeated across screen space. 

23 . A system for distributed convolution of samples for a pixel comprising: 

a set of N memories, wherein each memory (k) stores sample data for samples 
located within sub-set (k) of screen space, wherein k is an integer ranging 
from 0 to N-l; 

a sequence of N filter units arranged in M groups, wherein each filter unit (k) has a 

dedicated memory (k); 
a first set of partial sums buses that connect each filter unit within a group in series; 
a second set of partial sums buses that connect one group of filter units to another 

group of filter units; 
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wherein filter units within a group are operable to: 

receive accumulated partial sums from a prior filter unit within the group or 

from another group of filter units, 
read a set of samples from a corresponding memory (k), wherein the set of 

samples are within the sub-set (k) of screen space assigned to the 

memory (k), and wherein the set of samples are located within a 

convolution kernel defined for the pixel, 
calculate partial sums for the set of samples, 
add the partial sums to the accumulated partial sums, and 
output the accumulated partial sums to the next filter unit within the group of 

filter units, or to another group of filter units, 
and wherein a last filter unit within a group calculates pixel values from the final 
accumulated sums, after all samples within the convolution kernel have been 
processed. 

24. The system of claim 23, wherein N = 16, M = 4, and the 16 different sub-sets of 
screen space are interleaved, wherein each filter unit within a group addresses one 
sample bin in a 2 by 2 array of sample bins that is repeated across a 16 by 16 array 
of sample bins, and wherein four permutations of each of the four different 16 by 16 
arrays (one for each group) are combined to form a 64 by 64 array of sample bins 
that is repeated across screen space. 

25. A method for distributed filtering of samples comprising: 

calculating first partial sums in a first filter unit for a first set of samples, wherein 
the first set of samples is a portion of the samples located within a convolution 
kernel defined for a pixel location, and wherein the first set of samples are 
within a region of screen space assigned to the first filter unit; and 

sending the first partial sums to a sequence of additional filter units, wherein each 
of the additional filter units: 

receives accumulated partial sums from the previous filter unit, 
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calculates new partial suras for a corresponding set of samples located within 

the convolution kernel and within a corresponding region of screen 

space assigned to the filter unit, 
adds the new partial sums to the accumulated partial sums, and 
sends the new accumulated partial sums to the next filter unit in the sequence 

of filter units; and 
wherein a last filter unit in the sequence of filter units: 

receives accumulated partial sums from the previous filter unit in the 

sequence, 

calculates new partial sums for a corresponding portion of samples, 
adds the new partial sums to the accumulated partial sums to complete final 
accumulated partial sums. 

26. The method of claim 25, wherein a convolution kernel defined for a pixel is a 

region in screen space within a defined boundary and centered on the pixel location 
in screen space. 



27. 



28. 



The method of claim 25, wherein partial sums comprise 1) a sum of weighted 
sample values for the set of samples (a sum of the products of each sample value 
and a determined weight for the location of each sample) and 2) a sum of the 
weights determined for the locations of each sample. 

The method of claim 27, wherein a weight for the location of each sample value is 
determined by a weight function selected from a set of functions comprising a box 
filter, pyramid filter, circular filter, cone filter, Gaussian filter, and sine filter. 

29. The method of claim 27, wherein sample values comprise color values and 
transparency. 

30. The method of claim 25, wherein the last filter unit of the sequence of filter units 
calculates pixel values from the final accumulated sums. 
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31. The method of claim 30, wherein pixel values equal a sum of weighted sample 
values (for samples within the convolution kernel) times a reciprocal of a sum of 
the weights (determined for the locations of the samples) for each parameter value 
comprising one or more of color values and transparency. 

32. The method of claim 25, wherein each set of samples is within a screen space 
region (k) assigned to the corresponding filter unit (k), and wherein the set of 
samples are read from a memory (k) dedicated to the filter unit (k). 



33. 



34. 



The method of Claim 25, wherein for each filter unit (k) the corresponding set of 
samples resides in a sub-set (k) of screen space and the sub-sets are finely 
interleaved across screen space. 

The method of claim 33, wherein for a system of 16 filter units, each filter unit (k) 
addresses one sample bin in a 4 by 4 array of sample bins that is repeated across 
screen space. 

35. The method of claim 25, wherein the filter units are sub-divided into a plurality of 
groups of filter units and a plurality of partial sums buses interconnect the groups, 
and wherein each group is a chain of filter units. 

36. The method of claim 35, wherein a last filter unit in each group calculates pixel 
values from the final accumulated partial sums corresponding to the pixel. 



37. 



The method of claim 35, wherein for each filter unit (k) the set of samples resides in 
a sub-set (k) of screen space and the sub-sets are interleaved across screen space. 



38. The method of claim 37, wherein for a system of 4 groups of 4 filter units in a 
chain, each filter unit within a group addresses a sample bin in a 2 by 2 array of 
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sample bins that is repeated across a 16 by 16 array,.and wherein four permutations 
of each of the four different 16 by 16 arrays (one for each group) are combined to 
form a 64 by 64 array of sample bins that is repeated across screen space. 
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